Introduction
What is the bugRzilla Package?
BugRzilla is an R package that helps the user to interact with the Bugzilla through an API.
To learn more, see bugRzilla.
About the BugRzilla Google Summer of COde Project:-
The project bugRzilla is an issue tracker for the R-core members. The goal of the project is to help users to submit issues to R Bugzilla.
About the This Project:-
Explore the issues and bugs on the R Bugzilla to make the submission from bugRzilla better. It might help to identify useful patterns for R core or report the status of the R Bugzilla.
To learn more, see bugzilla_viz.
Setup Database on your local system
Download SQL and MySQL Workbench
To install SQL on Ubuntu one can refer a blog post by digitalocean. To install MySQL workbench on Ubuntu one can refer a blog post by linuxhint
Download R_bugzilla data
- The R_bugzilla data can be downloaded from link.
-
Since the downloaded data is a zip file so make sure you unzip the file before dumping the file which will have an extension
.sql(eg: R-bugs.sql).
Dump downloaded R_bugzilla to MySQL workbench.
Before one import the R_bugzilla SQL file one may need to create the (empty) database from MySQL if it doesn’t exist already and the exported SQL don’t contain CREATE DATABASE (exported with –no-create-db or -n option), before you can import it.
After considering this open your Terminal and run the command: mysqldump -u my_username -p database_name > output_file_path
-
The
-uflag indicates that the MySQLusernamewill follow. -
The
-pflag indicates we should be prompted for thepasswordassociated with the above username.database_nameis of course the exact name of the database to export. -
The
>symbol is a Unix directive forSTDOUT, which allows Unix commands to output the text results of the issued command to another location. In this case, that output location is a file path, specified byoutput_file_path.
bugRzilla Analysis
For the connection to the database, I’m using the dplyr package, it supports to the widely-used open source databases like MySQL.
The libraries used for the analysis:
# loading packages
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dbplyr)##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
library(RMySQL)## Loading required package: DBI
library(DBI)
library(DT)
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ stringr 1.4.0
## ✓ tidyr 1.1.3 ✓ forcats 0.5.1
## ✓ readr 1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dbplyr::ident() masks dplyr::ident()
## x dplyr::lag() masks stats::lag()
## x dbplyr::sql() masks dplyr::sql()
library(ggplot2)
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Connect bugRzilla SQL Database with R
# Connecting R with MySQL
con <- dbConnect(
MySQL(),
dbname='bugRzilla', # change the database name to your database name
username='root', # change the username to your username
password='1204', # update your password
host='localhost',
port=3306)
# Accessing Tables names from the Database
DBI::dbListTables(con)## [1] "attachments" "bugs" "bugs_activity" "bugs_fulltext"
## [5] "bugs_mod" "components" "longdescs"
Data Exploartion of Bugs Table from the Database
bugs_df <- tbl(con, "bugs")## Warning in .local(conn, statement, ...): Decimal MySQL column 24 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 25 imported as
## numeric
#for quick view of the datatypes and the structure of data
glimpse(bugs_df)## Rows: ??
## Warning in .local(conn, statement, ...): Decimal MySQL column 24 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 25 imported as
## numeric
## Columns: 27
## Database: mysql 8.0.26-0ubuntu0.20.04.2 [@localhost:/bugRzilla]
## $ bug_id <int> 1, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14, 15, 18, 19, 2…
## $ assigned_to <int> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ bug_file_loc <chr> "http://url.com/", "", "", "", "", "", "", "", "",…
## $ bug_severity <chr> "normal", "normal", "normal", "normal", "normal", …
## $ bug_status <chr> "CLOSED", "CLOSED", "CLOSED", "CLOSED", "CLOSED", …
## $ creation_ts <chr> "2010-02-15 12:29:54", "1998-08-07 22:25:05", "199…
## $ delta_ts <chr> "2018-01-16 10:21:14", "1998-08-09 21:50:29", "199…
## $ short_desc <chr> "Test bug report - summary", "[Prof Brian Ripley <…
## $ op_sys <chr> "Mac OS X v10.4", "All", "Solaris", "Solaris", "Ot…
## $ priority <chr> "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P…
## $ product_id <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ rep_platform <chr> "PowerPC", "All", "All", "All", "All", "All", "All…
## $ reporter <int> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ version <chr> "R 2.y.z", "old", "old", "old", "old", "old", "old…
## $ component_id <int> 7, 12, 14, 12, 9, 14, 12, 12, 16, 12, 14, 12, 11, …
## $ resolution <chr> "FIXED", "FIXED", "FIXED", "FIXED", "FIXED", "FIXE…
## $ target_milestone <chr> "---", "---", "---", "---", "---", "---", "---", "…
## $ qa_contact <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ status_whiteboard <chr> "", "", "", "", "", "", "", "", "", "", "", "", ""…
## $ votes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ lastdiffed <chr> "2018-01-16 10:21:14", "1998-08-07 22:25:05", "199…
## $ everconfirmed <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ reporter_accessible <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ cclist_accessible <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ estimated_time <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ remaining_time <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ deadline <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
From the above table we can conclude that the few of the columns are having wrong datatype like:
- creation_ts
- delta_ts
- lastdiffed
- estimated_time
- remaining_time
- deadline
estimated_time and remaining_time only contains the integer value. So, It can’t be transformed to Date format datatype. Also there are columns which are empty so they are of no use of the analysis like:
- target_milestone
- qa_contact
- status_whiteboard
# Converting `bugs_df` to `dataframe`
bugs_df <- as.data.frame(bugs_df)## Warning in .local(conn, statement, ...): Decimal MySQL column 24 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 25 imported as
## numeric
Cleaning the data
First steps, check the data and prepare it for what we want:
#converting the required fields in the correct datatype format
bugs_df <- bugs_df %>%
mutate_at(vars("creation_ts", "delta_ts", "lastdiffed", "deadline"), as.Date)
# Taking the columns which are useful
bugs_df <- bugs_df %>%
select("bug_id", "bug_severity", "bug_status", "creation_ts", "delta_ts", "op_sys", "priority", "resolution", "component_id", "version", "lastdiffed", "deadline")
#for quick view of the datatypes and the structure of data
glimpse(bugs_df)## Rows: 7,042
## Columns: 12
## $ bug_id <int> 1, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14, 15, 18, 19, 21, 22, …
## $ bug_severity <chr> "normal", "normal", "normal", "normal", "normal", "normal…
## $ bug_status <chr> "CLOSED", "CLOSED", "CLOSED", "CLOSED", "CLOSED", "CLOSED…
## $ creation_ts <date> 2010-02-15, 1998-08-07, 1998-08-10, 1998-08-13, 1998-08-…
## $ delta_ts <date> 2018-01-16, 1998-08-09, 1998-08-10, 1998-08-17, 1998-08-…
## $ op_sys <chr> "Mac OS X v10.4", "All", "Solaris", "Solaris", "Other", "…
## $ priority <chr> "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5", "P5…
## $ resolution <chr> "FIXED", "FIXED", "FIXED", "FIXED", "FIXED", "FIXED", "FI…
## $ component_id <int> 7, 12, 14, 12, 9, 14, 12, 12, 16, 12, 14, 12, 11, 11, 13,…
## $ version <chr> "R 2.y.z", "old", "old", "old", "old", "old", "old", "old…
## $ lastdiffed <date> 2018-01-16, 1998-08-07, 1998-08-10, 1998-08-13, 1998-08-…
## $ deadline <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#showing the `datatable`
datatable(head(bugs_df, 10), options = list(scrollX = TRUE))About the Bugs Data used for Analysis
I’ve taken the 12 columns under consideration to Analyse the Data. The brief discription about the columns as follows:- bug_id: The bug ID
- bug_severity: How severe the bug is, e.g. enhancement, critical
- bug_status: Current status, e.g. NEW, RESOLVED
- creation_ts: When bug was filed
- delta_ts: The timestamp of the last update on the bug. This includes updates to some related tables (e.g. “longdescs”).
- op_sys: Operating system bug was seen on, e.g. Windows Vista, Linux
- priority: The priority of the bug (P1 = most urgent, P5 = least urgent).
- resolution: The resolution, if the bug is in a closed state, e.g. FIXED, DUPLICATE
- component_id: Numeric ids of the components
- version: Version of software in which bug is seen
- lastdiffed: The time at which information about this bug changing was last emailed to the cc list.
- deadline: Date by which bug must be fixed
Visualizations
# Plotting the Time Series graph with the bug_id and creation_ts
bug_id <- bugs_df$bug_id
creation <- bugs_df$creation_ts
data <- data.frame(bug_id, creation)
fig1 <- plot_ly(data,
x = ~creation,
y = ~bug_id,
type = 'scatter',
mode = 'markers')
fig1## Warning: Ignoring 14 observations
# Plotting the Bar graph and adding Trace of Time-Series graph with bug_id and creation_ts to see the spread
fig1 <- plot_ly(data,
x = ~creation,
y = ~bug_id,
type = 'bar',
name = "bug_creation bar")
fig1 <- fig1 %>%
add_trace(fig1,
type = 'scatter',
mode='lines+markers',
name = "bug_creation Time_series")
fig1## Warning: Ignoring 14 observations
From the above the visualizations, The Time-series graph shows that which bug_id was filed in which month and year and from the bar graph we can conclude that in which year the most bugs are filed and when one will zoom the graphs, one can see on which date which bug was filed.
# Plotting the Time Series graph with the bug_id and delta_ts
delta <- bugs_df$delta_ts
data <- data.frame(bug_id, delta)
fig2 <- plot_ly(data,
x = ~delta,
y = ~bug_id,
type = 'scatter',
mode = 'markers')
fig2## Warning: Ignoring 30 observations
From the above the visualizations, The Time-series graph shows that which bug_id was last update.
# Plotting the Time Series graph with the bug_id and deadline
deadline <- bugs_df$deadline
data <- data.frame(bug_id, deadline)
fig3 <- plot_ly(data,
y = ~bug_id,
x = ~deadline,
type = 'scatter',
mode = 'markers')
fig3## Warning: Ignoring 7008 observations
From the above the visualizations, The Time-series graph shows that which bug_id was last update. Since, most of the bugs were not assigned the deadline to fix the bug.
# Plotting bar graph with bug_id and resolution
resolution <- bugs_df$resolution
data <- data.frame(bug_id, resolution)
fig4 <- plot_ly(data,
x = ~resolution,
y = ~bug_id,
type = 'bar')
fig4 From the above the visualizations, The Resolution bar-graph shows that which bug_id belongs to which category of resolution, if the bug is in a closed state, e.g. FIXED, DUPLICATE, etc. As we can conclude, that most bugs belongs to the fixed category of the resolution.
# Plotting bar graph with bug_id and bug_status
bug_status <- bugs_df$bug_status
data <- data.frame(bug_id, bug_status)
fig5 <- plot_ly(data,
x = ~bug_status,
y = ~bug_id,
type = 'bar')
fig5 From the above the visualizations, The bug_status bar-graph shows that which bug_id belongs to which category of bug_status, e.g. NEW, RESOLVED, etc. As we can conclude, that most bugs belongs to the closed category of the bug_status.
# Plotting bar graph with bug_id and bug_severity
bug_severity <- bugs_df$bug_severity
data <- data.frame(bug_id, bug_severity)
fig6 <- plot_ly(data,
x = ~bug_severity,
y = ~bug_id,
type = 'bar')
fig6 From the above the visualizations, The bug_severity bar-graph shows that which bug_id belongs to which category of bug_severity. Most of the bug which are filed are normal, some of the bugs need enhancements, minor and major and a very few bugs are filed under the blocker category.